CentML AI Inference Provider Integration #810

V2arK · 2025-01-17T19:36:22Z

What does this PR do?

Add CentML as a Remote Inference Provider in llama-stack.

This PR integrates CentML into llama-stack, enabling users to utilize CentML's models (meta-llama/Llama-3.3-70B-Instruct and meta-llama/Llama-3.1-405B-Instruct-FP8) for inference tasks like chat and text completions.

Right now only supported for conda deployments, simply build with llama stack build --template centml --image-type conda and run with llama stack run run.yaml --port <PORT> --env CENTML_API_KEY=<API_KEY>, then use llama-stack-client to perform any inference workload as needed.

Key Changes:

Added CentML as a remote inference provider with model supports for meta-llama/Llama-3.3-70B-Instruct and meta-llama/Llama-3.1-405B-Instruct-FP8.

Addresses issue #809

Test Plan

The integration tests is only ran against the dev cluster becuase of model's availability. Plus the completion endpoints and lobprobs / tool-calling ability is not yet implemented on centml end, the tests are failing so currently there is no point to alter it. Please see the following manual testing on the integration tests for more details.

Manual Testing:
- Changed to development cluster for testing with llama_3b (production cluster only have 70b and 405b models).
- right now only support for basic chat completion and models.
- no support for tool-calling / completion / logprobs.

Passed test:

llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_3b-centml] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_3b-centml] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_3b-centml] PASSED

Reproduction Instructions:

(P.S: the tests will only run on developement cluster because of model availability (which is this PR is not set to))

change URL to point to CentML development cluster

Run tests on dev cluster, result is the one above:

pytest -s -v --providers inference=centml llama_stack/providers/tests/inference/test_text_inference.py  -m "llama_3b" --env CENTML_API_KEY=<API_KEY>

Perform inference:

Sources

Related Issue: #809

Before submitting

This PR fixes an issue (CentML inference provider support #809)
Ran pre-commit to handle lint / formatting issues.
Read the contributor guideline, Pull Request section.
[] Updated relevant documentation.
[] Wrote necessary unit or integration tests.

facebook-github-bot · 2025-01-17T19:36:29Z

Hi @V2arK!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Add centml as remote inference provider

9bce587

V2arK requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic and sixianyi0721 as code owners January 17, 2025 19:36

V2arK marked this pull request as draft January 17, 2025 19:36

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 17, 2025

V2arK force-pushed the v2ark/add_centml branch from 79d74a9 to 9bce587 Compare January 17, 2025 20:23

V2arK added 3 commits January 17, 2025 15:25

revert test file

53af3fa

ruff fix and format

cd9aac1

ruff fix

27ab813

V2arK marked this pull request as ready for review January 21, 2025 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CentML AI Inference Provider Integration #810

CentML AI Inference Provider Integration #810

V2arK commented Jan 17, 2025 •

edited

Loading

facebook-github-bot commented Jan 17, 2025

CentML AI Inference Provider Integration #810

Are you sure you want to change the base?

CentML AI Inference Provider Integration #810

Conversation

V2arK commented Jan 17, 2025 • edited Loading

What does this PR do?

Key Changes:

Addresses issue #809

Test Plan

Manual Testing:

Passed test:

Reproduction Instructions:

Sources

Related Issue: #809

Before submitting

facebook-github-bot commented Jan 17, 2025

Action Required

Process

V2arK commented Jan 17, 2025 •

edited

Loading